Understanding CNN fragility when learning with imbalanced data
نویسندگان
چکیده
Abstract Convolutional neural networks (CNNs) have achieved impressive results on imbalanced image data, but they still difficulty generalizing to minority classes and their decisions are difficult interpret. These problems related because the method by which CNNs generalize classes, requires improvement, is wrapped in a black-box. To demystify CNN we focus latent features. Although embed pattern knowledge learned from training set model parameters, effect of this contained feature classification embeddings ( FE CE ). can be extracted trained global, class properties (e.g., frequency, magnitude identity) analyzed. We find that important information regarding ability network resides top-K . show learns limited number per category, magnitudes vary based whether same balanced or imbalanced. hypothesize diversity as examples, has implications for re-sampling cost-sensitive methods. methods generally rebalancing weights, numbers margins; instead diversifying also demonstrate test data if its features do not match set. use three popular datasets two algorithms commonly employed learning our experiments.
منابع مشابه
When Does Imbalanced Data Require more than Cost-Sensitive Learning?
Most classification algorithms expect the frequency of examples form each class to be roughly the same. However, this is rarely the case for real-world data where very often the class probability distribution is nonuniform (or, imbalanced). For these applications, the main problem is usually the fact that the costs of misclassifying examples belonging to rare classes differ significantly from t...
متن کاملEnhancing Learning from Imbalanced Classes via Data Preprocessing: A Data-Driven Application in Metabolomics Data Mining
This paper presents a data mining application in metabolomics. It aims at building an enhanced machine learning classifier that can be used for diagnosing cachexia syndrome and identifying its involved biomarkers. To achieve this goal, a data-driven analysis is carried out using a public dataset consisting of 1H-NMR metabolite profile. This dataset suffers from the problem of imbalanced classes...
متن کاملMining Imbalanced Data with Learning Classifier Systems
This chapter investigates the capabilities of XCS for mining imbalanced datasets. Initial experiments show that, for moderate and high class imbalances, XCS tends to evolve a large proportion of overgeneral classifiers. Theoretical analyses are developed, deriving an imbalance bound up to which XCS should be able to differentiate between accurate and overgeneral classifiers. Some relevant param...
متن کاملLearning When Data Sets are Imbalanced and When Costs are Unequal and Unknown
The problem of learning from imbalanced data sets, while not the same problem as learning when misclassification costs are unequal and unknown, can be handled in a similar manner. That is, in both contexts, we can use techniques from roc analysis to help with classifier design. We present results from two studies in which we dealt with skewed data sets and unequal, but unknown costs of error. W...
متن کاملLearning to Classify Data Streams with Imbalanced Class Distributions
Streaming data is pervasive in a multitude of data mining applications. One fundamental problem in the task of mining streaming data is distributional drift over time. Streams may also exhibit high and varying degrees of class imbalance, which can further complicate the task. In scenarios like these, class imbalance is particularly difficult to overcome and has not been as thoroughly studied. I...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Machine Learning
سال: 2023
ISSN: ['0885-6125', '1573-0565']
DOI: https://doi.org/10.1007/s10994-023-06326-9